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ABSTRACT 

This  paper  presents  a simple  and  exact  test  for  detecting 
a monotonic  relation  between  the  mean  and  variance  in  linear 
regression  through  the  origin.  This  test  resulted  from 
utilizing  uncorrelated  Thei 1-residuals  and  the  Goldfeld-Quandt 
peak  test.  A numerical  example  is  provided  to  elucidate  the 
method.  A simulation  experiment  was  performed  to  compare  the 
empirical  power  of  this  test  with  those  of  the  existing  tests. 


INTRODUCTION 


Consider  the  simple  linear  model  Y * X6  + e , where  Y is 
an  n-dimensional  random  vector  of  observations,  X is  an  n- 
dicensional  vector  consisting  of  known  nonstochastic  elements,  B 
is  an  unknown  scalar  and  c is  an  n-dimensional  random  vector,  and 
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where  o2  > 0 is  an  unknown  parameter  and  I is  the  n*n 
identity  matrix. 

The  least  squares  (LS)  estimator  8 of  6 and  the  least 
squares  predictor  t of  e are  given  by 


B = ( l x . y.)(  l x2)-1  , and 
i=l  i-1 

i = Y - XB  = Y - X ( l x y )(  £ x2)-1  = PY  , 

i=l  i»l 


where 

p = xn  - ( l XX'  . 

i=l 

Under  the  assumptions  (l.l) 

E[e]  = PX0  = 06  = 0 

ECe  c’J  = P ECYY'Jp*  = P E[e  e'3P'  = o2P  . 

Hence  it  is  clear  that  even  when  (l.l)  holds,  the  LS  estimators 
of  residuals  are  neither  independent  nor  do  they  have  constant 
variance  since  P / I 

n 

Goldfeld  and  Quandt  C19653  present  two  exact  tests  for 
testing  the  hypothesis  that  the  residuals  from  a least  squares 
regression  are  homoscedastic . The  first  test  is  parametric 
and  uses  the  F-statistic.  The  second  test  is  nonparametric 
and  uses  the  number  of  peaks  in  the  ordered  sequence  of 
unsigned  residuals.  Hedayat  and  Robson  Cl970],  among  other 
results,  have  demonstrated  the  failure  of  Goldfeld  and  Quandt 
peak  test  applied  to  LS  residuals.  One  reason  of  the  failure 
is  that  least  squares  residuals,  even  under  ideal  conditions, 
are  in  general  correlated  and  have  different  variances. 

In  this  paper,  we  work  with  a different  type  of  residuals 
which  are  free  from  the  above  criticism.  We  will  use  the  new 
residuals  to  detect  a monotonic  relation  between  the  mean  and 
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variance  by  means  of  the  peak  test  introduced  by  Goldfeld  and 
Quandt  [1965J. 


2.  T- RES  I DUALS  ADD  THEIR  PROPERTIES  i:i 
SIMPLE  LINEAR  REGRESSIOII  THROUGH  THE  ORIGi:i 

Theil  [1965D  has  presented  a predictor  of  e (designated 

by  T-residuals)  vhich  has  all  the  ordinary  properties  of  e 

except  that  the  covariance  matrix  of  T-residuals  is  o2  I . 

* n-1 

under  the  assumption  (l.l).  Koerts  C1967D  derived  the 
explicit  form  of  the  T-residuals  for  the  simple  linear  model 
through  the  origin.  Following  Koerts  the  elements  of  the 
vector  of  T-residuals  t*  can  be  represented  by 


e*  = yA  - b*xi  , i = 1,  2,  ....  n , i 4 k , 


where : 


J < lml  xi>'H  Vi  * Ki  < j1xir'i  yk ’s!1  • 


( [ x,  y,)  ( I x2) 
i=l  11  i=l 
i*k  i#k 


In  the  above  expression  k can  take  any  value  from  1 to  n . 
properties  of  T-residuals  are  the  following: 

(i)  t*  is  a linear  function  of  y^  , 

(ii)  ECejD  - 0 , i « 1,  2 n , i 4 k , 


(iii)  Cov[e*,  e*D  * 


0 , if  i 4 J 


o2  , if  i = J 


where  i,  J • l,  2,  n , i,  J i k , 


(iv)  The  T-residuals  have  a minimum  expected  sum  of 
squares  of  errors  (e*  - e^)  in  the  class  of 
predictors  satisfying  properties  (i),  (ii)  and 
(iii),  and 

(v)  l e*2  = l e2  . 
i=l  i=*l  1 

i*k 

As  can  be  seen  and  in  light  of  the  remarks  we  made  earlier, 
properties  (iii)  and  (v)  make  the  T-residuals  very  interesting 
indeed.  T-residuals  have  been  derived  based  on  the  first  four 
properties  and  Koerts  [19673  has  shown  that  they  also  have  the 
fifth  property. 

3.  A SIMPLE  Ai:D  EXACT  TEST  WHICH  DETECTS  MOHOTCNICITY  OF 
VARIANCES  IN  SIMPLE  LINEAR  REGRESSION  THROUGH  TEE  ORIGIN 

Consider  the  case  where  the  x^'s  have  been  ordered  such 
that  Xj,  < Xj  if  i < J and  suppose  our  interest  lies  in 
testing  the  following  hypothesis: 

H^:  E[e23  = o2  against  (3.1) 

Hl:  = °i  < = °j  for  1 < J • 

Note  that  the  alternative  hypothesis  says  that  as  x increases 
the  variance  of  e or  y also  increases.  We  are  considering 
the  case  where  we  have  only  a single  observation  for  each  level 
x , as  is  frequently  the  case. 

Two  alternative  tests  for  testing  HQ  against  are 

suggested  by  Goldfeld  and  Quandt  [19653,  namely: 

(i)  The  F test 

The  obvious  choice  for  k is  then  the  middle 
observation,  so  that  one  can  compute  the  ratio  of  the  sum  of 
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squares  of  the  first  (n  - l)/2  predicted  residuals  to  that  of 
the  last  (n  - l)/2  , which  is  F distributed.  Yftien  n - 1 
is  not  even,  one  can  use  either  (n  - 2)/2  first  and  n/2 
last  observations  or  n/2  first  and  (n  - 2)/2  last  obser- 
vations, and  for  this  choice  see  Theil  [1965]]. 

(ii)  The  Peak  test 

For  residuals  ordered  by  the  ordering  of  xi  , 

< Xj+1  , define  a peak  at  x^  to  be  an  instance  where 
UJ  > jcj|  for  J = 1,  2 i - 1 . 

The  validity  of  applying  the  Goldfeld  Quandt  peak  test  to 
the  T-residuals  is  seen  by  noting  that  under  HQ  , the  e*'s 
are  uncorrelated  so  that  under  the  normality  assumption  they 
will  be  independent. 

In  the  class  of  regressions  restricted  by  the  conditions 
that  the  x^'s  are  positive  and 

[(ox?  - l)2  - c2x?X2] 
o?/(j2  < — 

1 J ^ClXi  - ~ clxixp 

where  c^  is  given  below,  we  show  that  under  , 
var[e*D  < varCe*D  . This  means  that  especially  in  such 
settings  a greater  sensitivity  can  be  expected  of  the  peak 
test  based  on  the  T-residuals  than  from  the  F-test,  which 
is  a general  test. 

THEOREM  3.1.  U E[e.  e]  = 0 , i * J , and  ECeJ] 

* cj2  < E[Gj]  = <j2  , then  var[e*3  < var[e*]  i £ xt  > 0 , 

V t and 

C(c.x2  - 1)2  - cW] 
o?/ 0?  < — ^ 3JUL  . 

J C(c!xi  ~ i)2  _ cixixp 

Proof.  Under  these  assumptions  and  by  definition  of  £ 
varCc*]  = ECc*2]  - (E[c »])2  = E[c»23 

• °i(cixi  - 1)2  * °3cixi  * cixixj°j  * (okxi)/ j1  x?  • 

where 


•r  " ~ jKMTk 


H*  <c 


ca  = ( I *2)"^  ( i xjr1 

1 L t=l  1 J tfk  t 


C2  = °k  ( J=1  X?rl  and 


c3  = I x?  °\ 
5 1 


var[e*D  - var[e*]  = o2  (c.jX2  - l)2  - o2  (c.^2  - l)2 
+ c c2  (x2  ~ x2)  + c2x2x2cr2  — c2x2x^o2  + c ( x2  — x^) 

C3C1  ^XJ  Xi ' Clxi  j°i  ClXiXj°j  c2  UJ  V • 

Since  xi  < x^  and  they  are  positive,  it  follows  that  in  order 

to  show  var[e*D  - var[e*!]  £ 0 it  is  sufficient  to  show  that 


( c, x2  - l): 


J '-1“J  i 

and  this  will  be  true  if  and  only  if 


2 (c^x2  - l)2  + c 2x2x2o 2 - c2x?x2o2  2 0 


'1  i j i 


'I  i J J 


°V°] 


[(v?  - 1)2  - 

[(CjXf  - 1)2  - cjxjxj] 


U.  A NUMERICAL  ILLUSTRATION 

To  elucidate  the  use  of  our  peak  test  we  go  for  the 
benefit  of  the  reader  through  a complete  example.  Let  us 
consider  the  example  (see  Table  I)  given  on  page  180  of 
Steel  and  Torrie  C1960D.  As  these  authors  have  pointed  out, 
in  this  instance  the  regression  line  should  pass  through  the 
origin.  Therefore,  8 = 3.67  and  hence  the  regression  line 
is  given  by  y = 3.67x  . The  individual  least  square  residuals, 
after  rounding  to  one  decimal  place,  are  given  in  Table  I. 


TABLE  I 


Induced  reversions  to  independence  per  10  surviving  cells 
y per  dose  (ergs/Bacteriun)  lCf^x  of  Streptomycin  dependent 
Escherichia  Coli  subjected  to  monschromatic  ultraviolet 
radiation  of  2,967  Angstroms  wave  length. 


X 

y 

e 

13.6 

52 

2.0 

13.9 

1*8 

-3.1 

21.1 

72 

-5.5 

25.6 

89 

-5.1 

26. 1* 

80 

-17.0 

39.8 

130 

-16.2 

1*0.1 

139 

-8.3 

1*3.9 

173 

11.7 

51.9 

208 

17.3 

53.2 

225 

29.5 

65.2 

259 

19.5 

66.1* 

199 

-1*5.0 

67.7 

255 

6.3 

First  of  all,  visual  examination  of  these  residuals  suggests, 
that  there  is  a pattern  for  the  distribution  of  plus  and  minus 
signs  among  the  e^'s  • Secondly,  graphical  plotting  of 
residuals  against  the  fitted  values  or  x-values  strongly 
suggests  that  the  error  variance  increases  with  x . How, 
suppose  we  suspect  the  assumption  E[ep  = a2  for  all  i 
and  in  particular  we  suspect  that  the  variance  may  increase 
with  the  mean,  i.e.  that  the  variance  of  y increases  as 
x increases.  To  test  against  this  alternative  hypothesis 
ve  first  compute  the  T-residuals.  We  note  that  under  H 


the  distribution  of  the  number  of  peaks  is  independent  of 
the  choice  of  k , which  depends  primarily  on  the  power 
of  the  test  with  respect  to  a specific  alternative 
hypothesis.  However,  it  seems  that  the  index  of  the 
middle  observation  would  be  a reasonable  choice  of  k 
for  our  general  . Recall  that  puts  no  restriction 

on  the  monotonicty  structure  of  the  variance  other  than 
being  increasing.  If  we  let  k = 7 , we  have 

e*  = y.  - b*x.  , i = 1,  2,  ....  6,  8,  ...,  13  (k.l) 

where  b*  = 3.63  . Thus,  the  individual  T-residuals,  after 

rounding  to  one  decimal  place,  are  as  follows: 

e*  = +2.6  eg  = +13.5 

e*  = -2.5  e*  = +19. k 

eg  = -U.6  egQ  = 31.7 

ej*  = -U . 0 e*x  = 22.1 

eg  = -15.9  eg2  = -U2.2 

eg  = -lk. 6 e*  = +9.0 

b 13 

The  number  of  peaks  is  5. 

The  e*'s  are  independent  and  identically  distributed 
under  the  homoscedasticity  and  normality  assumptions  of  the 
Ej's  . Now,  we  can  compute  the  probability  of  obtaining  five 
or  more  peaks  in  a sequence  of  12  independent  and  identically 
distributed  random  variables  using  Table  I from  Goldfeld  and 
Quandt  [196511.  By  interpolation  from  this  table  we  see  that 
this  probability  is  about  .036  . If  we  can  accept  a risk  of 
3.6  percent,  then  we  should  fit  a weighted  regression  rather 
than  the  unweighted  one  for  obtaining  an  efficient  estimate 
of  g and  hence  the  regression  line. 

5.  SIMULATION  STUDY 

We  consider  the  simple  model  yi  = Xj  (g  + ei ) , 


f! 


t 


i = 1,  2,  ...»  n . Sampling  experiments  were  performed  on 

this  model  in  order  to  obtain  empirical  estimates  of  the 

powers  of  three  tests  l)  F-test,  2)  Goldfeld-Quandt  peak 

test  and  3)  Peak  test  based  on  the  uncorrelated  T-residuals. 

The  independent  variable  was  identical  in  repeated  samples 

and  each  particular  sample  of  x's  was  chosen  from  the 

uniform  distribution  with  mean  = 30,  ho,  50  and 

standard  deviation  a = 10,  20,  25..  The  total  number  of 

x 

observations  was  31  . For  each  u , o combination,  one 

x x 

sample  of  x's  was  generated  and  for  each  such  sample,  1000 
samples  of  31  e-values  were  generated.  In  our  simulation 
study  we  considered  three  distributions  for  the  errors  e 

a)  the  normal  distribution  with  zero  mean  and  unit  variance 

b)  the  student's  "t"  with  2 degrees  of  freedom  (d.f.)  and 

c)  the  adjusted  chi-square  distribution  with  U d.f.,  adjusted 
so  that  the  mean  is  equal  to  zero. 

Uniform  pseudorandom  numbers  were  generated  by  a 
multiplicative-congruential  method  of  an  IBM  360/65.  The 
uniform  variates  were  used  to  form  observations  from  the 
distribution  studied;  the  Gaussian  by  a modification  of  the 
Box-Muller  method;  the  chi-square  with  I4  d.f.  as  -2  times 
the  logarithm  of  the  product  of  2 independent  uniform 
random  numbers;  and  the  t with  2 d.f.  as  the  ratio  of  a 
Gaussian  and  the  square  root  of  a chi-square  with  2 d.f. 

The  Monte  Carlo  results  for  the  various  distributions 
are  given  in  Table  II.  The  simulation  results  clearly 
establish  the  superiority  of  the  peak-test  based  on  T- 
residuals  over  the  other  two  tests  in  case  of  normal  and 
chi-square  distributions.  In  case  of  "t"  with  2 d.f., 

F-test  compare  favorably  with  Peak  test  on  T-residuals. 
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TABLE  II 


r 


4 


iinpirical  Power  for  Nominal  Size  of  .05 


a) 

Distribution  - 
UX  °X 

of  errors : 
F-test 

normal,  mean  = 

Peak  Test  on 
LS  Residuals 

0,  variance  = 1 

Peak  Test  on 
T-Residuals 

30 

10 

.01*5 

.015 

.11*1* 

20 

.023 

.008 

.751 

25 

.018 

.006 

.521 

1*0 

10 

.052 

.020 

.08 

20 

.031 

.015 

.339 

25 

.025 

.007 

.669 

50 

10 

.052 

.02 

.059 

20 

.039 

.016 

.209 

25 

.031 

.015 

.339 

b) 

Distribution 

of  errors: 

: t with  2 d.f 

30 

10 

.206 

.012 

.081 

20 

.156 

.011 

.1*26 

25 

.128 

.008 

.253 

1*0 

10 

.22 

.013 

.050 

20 

.180 

.012 

.157 

25 

.157 

.015 

.360 

50 

10 

.233 

.011* 

.037 

20 

.199 

.015 

.106 

25 

.181 

.012 

.157 

c) 

Distribution 

of  errors : 

adjusted  chi- 

square  with  1*  d. 

30 

10 

.11*1* 

.015 

.13 

20 

.161* 

.018 

.617 

25 

.165 

.018 

.1*02 

1*0 

10 

.11*3 

.018 

.075 

20 

.161 

.015 

.302 

25 

.161* 

.017 

.51*8 

50 

10 

.136 

.016 

.058 

20 

.11*9 

.013 

.171* 

25 

.161 

.015 

.302 

f 


10 


1 


1 
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